Information bottleneck
Claude Shannon‘s Rate Distortion theory formalizes the trade off between compression and the preservation of meaning (e.g., how higher-level categories can compress and preserve). Namely, given data and compressed representation , it seeks encodings that minimize where is the rate (number of bits) and is distortion.
- [ ] why do we want to minimize Mutual information here?
Information bottleneck theory then argues to minimize the following when compressing into while preserving information about relevant variable :